Search CORE

330 research outputs found

Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?

Author: Blum Avrim
Stangl Kevin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 1st Symposium on Foundations of Responsible Computing (FORC 2020)
Publication date: 02/12/2019
Field of study

Multiple fairness constraints have been proposed in the literature, motivated by a range of concerns about how demographic groups might be treated unfairly by machine learning classifiers. In this work we consider a different motivation; learning from biased training data. We posit several ways in which training data may be biased, including having a more noisy or negatively biased labeling process on members of a disadvantaged group, or a decreased prevalence of positive or negative examples from the disadvantaged group, or both. Given such biased training data, Empirical Risk Minimization (ERM) may produce a classifier that not only is biased but also has suboptimal accuracy on the true data distribution. We examine the ability of fairness-constrained ERM to correct this problem. In particular, we find that the Equal Opportunity fairness constraint [Hardt et al., 2016] combined with ERM will provably recover the Bayes optimal classifier under a range of bias models. We also consider other recovery methods including re-weighting the training data, Equalized Odds, and Demographic Parity, and Calibration. These theoretical results provide additional motivation for considering fairness interventions even if an actor cares primarily about accuracy

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Fast Private Data Release Algorithms for Sparse Queries

Author: Blum Avrim
Roth Aaron
Publication venue
Publication date: 29/11/2011
Field of study

We revisit the problem of accurately answering large classes of statistical queries while preserving differential privacy. Previous approaches to this problem have either been very general but have not had run-time polynomial in the size of the database, have applied only to very limited classes of queries, or have relaxed the notion of worst-case error guarantees. In this paper we consider the large class of sparse queries, which take non-zero values on only polynomially many universe elements. We give efficient query release algorithms for this class, in both the interactive and the non-interactive setting. Our algorithms also achieve better accuracy bounds than previous general techniques do when applied to sparse queries: our bounds are independent of the universe size. In fact, even the runtime of our interactive mechanism is independent of the universe size, and so can be implemented in the "infinite universe" model in which no finite universe need be specified by the data curator

arXiv.org e-Print Archive

CiteSeerX

Advancing Subgroup Fairness via Sleeping Experts

Author: Blum Avrim
Lykouris Thodoris
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 02/12/2019
Field of study

We study methods for improving fairness to subgroups in settings with overlapping populations and sequential predictions. Classical notions of fairness focus on the balance of some property across different populations. However, in many applications the goal of the different groups is not to be predicted equally but rather to be predicted well. We demonstrate that the task of satisfying this guarantee for multiple overlapping groups is not straightforward and show that for the simple objective of unweighted average of false negative and false positive rate, satisfying this for overlapping populations can be statistically impossible even when we are provided predictors that perform well separately on each subgroup. On the positive side, we show that when individuals are equally important to the different groups they belong to, this goal is achievable; to do so, we draw a connection to the sleeping experts literature in online learning. Motivated by the one-sided feedback in natural settings of interest, we extend our results to such a feedback model. We also provide a game-theoretic interpretation of our results, examining the incentives of participants to join the system and to provide the system full information about predictors they may possess. We end with several interesting open problems concerning the strength of guarantees that can be achieved in a computationally efficient manner

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

On Statistical Query Sampling and NMR Quantum Computing

Author: Blum Avrim
Yang Ke
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

We introduce a ``Statistical Query Sampling'' model, in which the goal of an algorithm is to produce an element in a hidden set

Ssubseteqbit^n

with reasonable probability. The algorithm gains information about

S

through oracle calls (statistical queries), where the algorithm submits a query function

g(cdot)

and receives an approximation to

Pr_{x in S}[g(x)=1]

. We show how this model is related to NMR quantum computing, in which only statistical properties of an ensemble of quantum systems can be measured, and in particular to the question of whether one can translate standard quantum algorithms to the NMR setting without putting all of their classical post-processing into the quantum system. Using Fourier analysis techniques developed in the related context of {em statistical query learning}, we prove a number of lower bounds (both information-theoretic and cryptographic) on the ability of algorithms to produces an

xin S

, even when the set

S

is fairly simple. These lower bounds point out a difficulty in efficiently applying NMR quantum computing to algorithms such as Shor's and Simon's algorithm that involve significant classical post-processing. We also explicitly relate the notion of statistical query sampling to that of statistical query learning. An extended abstract appeared in the 18th Aunnual IEEE Conference of Computational Complexity (CCC 2003), 2003. Keywords: statistical query, NMR quantum computing, lower boundComment: 17 pages, no figures. Appeared in 18th Aunnual IEEE Conference of Computational Complexity (CCC 2003

arXiv.org e-Print Archive

Crossref

CERN Document Server

Center-based Clustering under Perturbation Stability

Author: Awasthi Pranjal
Blum Avrim
Sheffet Or
Publication venue
Publication date: 11/08/2011
Field of study

Clustering under most popular objective functions is NP-hard, even to approximate well, and so unlikely to be efficiently solvable in the worst case. Recently, Bilu and Linial \cite{Bilu09} suggested an approach aimed at bypassing this computational barrier by using properties of instances one might hope to hold in practice. In particular, they argue that instances in practice should be stable to small perturbations in the metric space and give an efficient algorithm for clustering instances of the Max-Cut problem that are stable to perturbations of size

O(n^{1/2})

. In addition, they conjecture that instances stable to as little as O(1) perturbations should be solvable in polynomial time. In this paper we prove that this conjecture is true for any center-based clustering objective (such as

k

-median,

k

-means, and

k

-center). Specifically, we show we can efficiently find the optimal clustering assuming only stability to factor-3 perturbations of the underlying metric in spaces without Steiner points, and stability to factor

2+\sqrt{3}

perturbations for general metrics. In particular, we show for such instances that the popular Single-Linkage algorithm combined with dynamic programming will find the optimal clustering. We also present NP-hardness results under a weaker but related condition

arXiv.org e-Print Archive

Noise-Tolerant Learning, the Parity Problem, and the Statistical Query Model

Author: Blum Avrim
Kalai Adam
Wasserman Hal
Publication venue
Publication date: 01/01/2000
Field of study

We describe a slightly sub-exponential time algorithm for learning parity functions in the presence of random classification noise. This results in a polynomial-time algorithm for the case of parity functions that depend on only the first O(log n log log n) bits of input. This is the first known instance of an efficient noise-tolerant algorithm for a concept class that is provably not learnable in the Statistical Query model of Kearns. Thus, we demonstrate that the set of problems learnable in the statistical query model is a strict subset of those problems learnable in the presence of noise in the PAC model. In coding-theory terms, what we give is a poly(n)-time algorithm for decoding linear k by n codes in the presence of random noise for the case of k = c log n loglog n for some c > 0. (The case of k = O(log n) is trivial since one can just individually check each of the 2^k possible messages and choose the one that yields the closest codeword.) A natural extension of the statistical query model is to allow queries about statistical properties that involve t-tuples of examples (as opposed to single examples). The second result of this paper is to show that any class of functions learnable (strongly or weakly) with t-wise queries for t = O(log n) is also weakly learnable with standard unary queries. Hence this natural extension to the statistical query model does not increase the set of weakly learnable functions

arXiv.org e-Print Archive

CiteSeerX